Efficient Exploration in Resource-Restricted Reinforcement Learning

نویسندگان

چکیده

In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types resources that are non-replenishable in each episode. Typical include robotic control with limited energy and video games consumable items. tasks resources, we observe popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust fast thus the subsequent exploration severely restricted due absence resources. To address this challenge, first formalize aforementioned problem a resource-restricted learning, then propose novel resource-aware bonus (RAEB) make reasonable usage An appealing feature RAEB it can significantly reduce unnecessary resource-consuming trials while effectively encouraging agent explore unvisited states. Experiments demonstrate proposed outperforms state-of-the-art strategies environments, improving efficiency by up an order magnitude.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Exploration in Reinforcement Learning

An agent acting in a world makes observations, takes actions, and receives rewards for the actions taken. Given a history of such interactions, the agent must make the next choice of action so as to maximize the long term sum of rewards. To do this well, an agent may take suboptimal actions which allow it to gather the information necessary to later take optimal or near-optimal actions with res...

متن کامل

Efficient Exploration for Reinforcement Learning

Reinforcement learning is often regarded as one of the hardest problems in machine learning. Algorithms for solving these problems often require copious resources in comparison to other problems, and will often fail for no obvious reason. This report surveys a set of algorithms for various reinforcement learning problems that are known to terminate with good solution after a number of interacti...

متن کامل

Resource Constrained Exploration in Reinforcement Learning

This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid ...

متن کامل

Learning to soar: Resource-constrained exploration in reinforcement learning

This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resourcelimited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(l), uses a Gaussian process regression model to estimate the value function in a reinforcem...

متن کامل

Efficient Reinforcement Learning via Initial Pure Exploration

In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing he...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i8.26224